AITopics | Mortgages

Collaborating Authors

Mortgages

A Censored Transformed Model for Proportional Outcomes with Boundary Mass and an Application to Loss Given Default Modeling

arXiv.org Machine LearningJun-23-2026

We introduce the zero-one censored transformed normal (ZOC-TN) model for proportional responses with potential probability mass at the boundaries 0 and 1. The model combines a censored Gaussian variable with a two-parameter affine-logit transformation on the interior (0,1). We characterize the transformation parameters, establish large-sample properties, and relate the affine-logit specification to broader classes of interior distributions. Theoretical and experimental results demonstrate that the proposed model can capture a wider range of qualitative density shapes than several benchmark models while remaining parsimonious, computationally efficient, and numerically stable. Furthermore, the ZOC-TN model can be extended (i) to account for nonlinearities and interactions in a tree-boosting machine learning framework and (ii) to explicitly model residual spatio-temporal variability. We apply the ZOC-TN model to loss given default (LGD) modeling for a large dataset of U.S. residential mortgages and compare it to multiple benchmark models. We find that a tree-boosted ZOC-TN model with a spatio-temporal frailty Gaussian process delivers the strongest out-of-sample performance, indicating that mortgage losses are shaped by nonlinear covariate effects and by unaccounted-for space-time variation.

artificial intelligence, machine learning, zoc-tn model, (17 more...)

arXiv.org Machine Learning

2606.21515

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Real Estate (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Incorporating data drift to perform survival analysis on credit risk

Peng, Jianwei, Lessmann, Stefan

arXiv.org Machine LearningJan-29-2026

Survival analysis has become a standard approach for modelling time to default by time-varying covariates in credit risk. Unlike most existing methods that implicitly assume a stationary data-generating process, in practise, mortgage portfolios are exposed to various forms of data drift caused by changing borrower behaviour, macroeconomic conditions, policy regimes and so on. This study investigates the impact of data drift on survival-based credit risk models and proposes a dynamic joint modelling framework to improve robustness under non-stationary environments. The proposed model integrates a longitudinal behavioural marker derived from balance dynamics with a discrete-time hazard formulation, combined with landmark one-hot encoding and isotonic calibration. Three types of data drift (sudden, incremental and recurring) are simulated and analysed on mortgage loan datasets from Freddie Mac. Experiments and corresponding evidence show that the proposed landmark-based joint model consistently outperforms classical survival models, tree-based drift-adaptive learners and gradient boosting methods in terms of discrimination and calibration across all drift scenarios, which confirms the superiority of our model design.

artificial intelligence, calibration, machine learning, (20 more...)

arXiv.org Machine Learning

2601.20533

Country:

North America > United States (0.34)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance > Credit (1.00)
Banking & Finance > Risk Management (0.93)
Banking & Finance > Loans > Mortgages (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Architecture of Trust: A Framework for AI-Augmented Real Estate Valuation in the Era of Structured Data

Teikari, Petteri, Jarrell, Mike, Azh, Maryam, Pesola, Harri

arXiv.org Artificial IntelligenceAug-6-2025

The Uniform Appraisal Dataset (UAD) 3.6's mandatory 2026 implementation transforms residential property valuation from narrative reporting to structured, machine-readable formats. This paper provides the first comprehensive analysis of this regulatory shift alongside concurrent AI advances in computer vision, natural language processing, and autonomous systems. We develop a three-layer framework for AI-augmented valuation addressing technical implementation and institutional trust requirements. Our analysis reveals how regulatory standardization converging with AI capabilities enables fundamental market restructuring with profound implications for professional practice, efficiency, and systemic risk. We make four key contributions: (1) documenting institutional failures including inter-appraiser variability and systematic biases undermining valuation reliability; (2) developing an architectural framework spanning physical data acquisition, semantic understanding, and cognitive reasoning that integrates emerging technologies while maintaining professional oversight; (3) addressing trust requirements for high-stakes financial applications including regulatory compliance, algorithmic fairness, and uncertainty quantification; (4) proposing evaluation methodologies beyond generic AI benchmarks toward domain-specific protocols. Our findings indicate successful transformation requires not merely technological sophistication but careful human-AI collaboration, creating systems that augment rather than replace professional expertise while addressing historical biases and information asymmetries in real estate markets.

data mining, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2508.02765

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.65)

Industry:

Banking & Finance > Real Estate (1.00)
Banking & Finance > Loans > Mortgages (1.00)
Government > Regional Government > North America Government > United States Government (0.67)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(7 more...)

Add feedback

Large-Scale Diverse Synthesis for Mid-Training

Zhang, Xuemiao, Tu, Chengying, Ren, Can, Weng, Rongxiang, Yan, Hongfei, Wang, Jingang, Cai, Xunliang

arXiv.org Artificial IntelligenceAug-5-2025

The scarcity of high-quality, knowledge-intensive training data hinders the development of large language models (LLMs), as traditional corpora provide limited information. Previous studies have synthesized and integrated corpora-dependent question-answering (QA) data to improve model performance but face challenges in QA data scalability and knowledge diversity, particularly in cross-domain contexts. Furthermore, leveraging our designed discipline and difficulty annotation system, we probe model deficiencies in STEM disciplines and high-difficulty data. To overcome these limitations, we propose a novel diversified pipeline to synthesize BoostQA, a 100B-token large-scale QA dataset. Our synthesis framework: (1) curates seed data from heterogeneous sources; (2) utilizes DeepSeek-R1 to implement STEM-focused multi-grade synthesis to boost data diversity and high-difficulty synthesis to mitigate difficulty degradation; (3) refines answers via DeepSeek-V3 to improve output quality. We utilize BoostQA in mid-training, a mid-stage between pre-training and post-training, to optimize domain-specific knowledge acquisition and enhance data quality. Our method enables Llama-3 8B, mid-trained on a 40B-token dataset, to achieve an average improvement of 12.74% on MMLU and CMMLU and establish SOT A average performance across 12 benchmarks. BoostQA also demonstrates robust scalability, with performance consistently improving as model size, data volume, and initial FLOPs scale.

boostqa, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2508.01326

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (0.93)
Information Technology (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Banking & Finance > Loans > Mortgages (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

Transforming Credit Risk Analysis: A Time-Series-Driven ResE-BiLSTM Framework for Post-Loan Default Detection

Yang, Yue, Lin, Yuxiang, Zhang, Ying, Su, Zihan, Goh, Chang Chuan, Fang, Tangtangfang, Bellotti, Anthony Graham, Lee, Boon Giin

arXiv.org Artificial IntelligenceAug-4-2025

Prediction of post-loan default is an important task in credit risk management, and can be addressed by detection of financial anomalies using machine learning. This study introduces a ResE-BiLSTM model, using a sliding window technique, and is evaluated on 44 independent cohorts from the extensive Freddie Mac US mortgage dataset, to improve prediction performance. The ResE-BiLSTM is compared with five baseline models: Long Short-Term Memory (LSTM), BiLSTM, Gated Recurrent Units (GRU), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), across multiple metrics, including Accuracy, Precision, Recall, F1, and AUC. An ablation study was conducted to evaluate the contribution of individual components in the ResE-BiLSTM architecture. Additionally, SHAP analysis was employed to interpret the underlying features the model relied upon for its predictions. Experimental results demonstrate that ResE-BiLSTM achieves superior predictive performance compared to baseline models, underscoring its practical value and applicability in real-world scenarios.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2508.00415

Country:

North America > United States (0.35)
Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Banking & Finance > Credit (1.00)
Banking & Finance > Loans > Mortgages (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Binary AddiVortes: (Bayesian) Additive Voronoi Tessellations for Binary Classification with an application to Predicting Home Mortgage Application Outcomes

Stone, Adam J., Ogundimu, Emmanuel, Gosling, John Paul

arXiv.org Artificial IntelligenceMar-18-2025

The Additive Voronoi Tessellations (AddiVortes) model is a multivariate regression model that uses multiple Voronoi tessellations to partition the covariate space for an additive ensemble model. In this paper, the AddiVortes framework is extended to binary classification by incorporating a probit model with a latent variable formulation. Specifically, we utilise a data augmentation technique, where a latent variable is introduced and the binary response is determined via thresholding. In most cases, the AddiVortes model outperforms random forests, BART and other leading black-box regression models when compared using a range of metrics. A comprehensive analysis is conducted using AddiVortes to predict an individual's likelihood of being approved for a home mortgage, based on a range of covariates. This evaluation highlights the model's effectiveness in capturing complex relationships within the data and its potential for improving decision-making in mortgage approval processes.

addivorte, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.21792

Country:

North America > United States (0.28)
Europe > Switzerland (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance > Loans > Mortgages (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction

Huang, Chengyue, Yang, Yahe

arXiv.org Artificial IntelligenceDec-23-2024

With the widespread application of machine learning in financial risk management, conventional wisdom suggests that longer training periods and more feature variables contribute to improved model performance. This paper, focusing on mortgage default prediction, empirically discovers a phenomenon that contradicts traditional knowledge: in time series prediction, increased training data timespan and additional non-critical features actually lead to significant deterioration in prediction effectiveness. Using Fannie Mae's mortgage data, the study compares predictive performance across different time window lengths (2012-2022) and feature combinations, revealing that shorter time windows (such as single-year periods) paired with carefully selected key features yield superior prediction results. The experimental results indicate that extended time spans may introduce noise from historical data and outdated market patterns, while excessive non-critical features interfere with the model's learning of core default factors. This research not only challenges the traditional "more is better" approach in data modeling but also provides new insights and practical guidance for feature selection and time window optimization in financial risk prediction.

data mining, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2501.00034

Country: North America > United States (0.49)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry:

Banking & Finance > Loans > Mortgages (0.68)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Simulate and Optimise: A two-layer mortgage simulator for designing novel mortgage assistance products

Ardon, Leo, Evans, Benjamin Patrick, Garg, Deepeka, Narayanan, Annapoorani Lakshmi, Henry-Nickie, Makada, Ganesh, Sumitra

arXiv.org Artificial IntelligenceNov-1-2024

We develop a novel two-layer approach for optimising mortgage relief products through a simulated multi-agent mortgage environment. While the approach is generic, here the environment is calibrated to the US mortgage market based on publicly available census data and regulatory guidelines. Through the simulation layer, we assess the resilience of households to exogenous income shocks, while the optimisation layer explores strategies to improve the robustness of households to these shocks by making novel mortgage assistance products available to households. Households in the simulation are adaptive, learning to make mortgage-related decisions (such as product enrolment or strategic foreclosures) that maximize their utility, balancing their available liquidity and equity. We show how this novel two-layer simulation approach can successfully design novel mortgage assistance products to improve household resilience to exogenous shocks, and balance the costs of providing such products through post-hoc analysis. Previously, such analysis could only be conducted through expensive pilot studies involving real participants, demonstrating the benefit of the approach for designing and evaluating financial products.

artificial intelligence, borrower, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3677052.3698607

2411.00563

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Industry:

Banking & Finance > Real Estate (1.00)
Banking & Finance > Loans > Mortgages (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hamiltonian Neural Networks for Robust Out-of-Time Credit Scoring

Marín, Javier

arXiv.org Artificial IntelligenceOct-14-2024

This paper introduces a novel Hamiltonian-inspired neural network approach to credit scoring, designed to address the challenges of class imbalance and out-of-time (OOT) prediction in financial risk management. Drawing from concepts in Hamiltonian mechanics, we develop a symplectic optimizer and a new loss function to capture the complex dynamics of credit risk evolution. Using the Freddie Mac Single-Family Loan-Level Dataset, we evaluate our model's performance against other machine learning approaches. Our method shows superior discriminative power in OOT scenarios, as measured by the Area Under the Curve (AUC), indicating better ranking ability and robustness to class imbalance. The Hamiltonian-inspired approach shows particular strength in maintaining consistent performance between in-sample and OOT test sets, suggesting improved generalization to future, unseen data. These findings suggest that physics-inspired techniques offer a promising direction for developing more robust and reliable credit scoring models, particularly in uncertain economic situations.

application, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.10182

Country:

North America > United States (0.36)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Banking & Finance > Credit (1.00)
Banking & Finance > Loans > Mortgages (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Temporal Relational Reasoning of Large Language Models for Detecting Stock Portfolio Crashes

Koa, Kelvin J. L., Ma, Yunshan, Ng, Ritchie, Zheng, Huanhuan, Chua, Tat-Seng

arXiv.org Artificial IntelligenceOct-7-2024

Stock portfolios are often exposed to rare consequential events (e.g., 2007 global financial crisis, 2020 COVID-19 stock market crash), as they do not have enough historical information to learn from. Large Language Models (LLMs) now present a possible tool to tackle this problem, as they can generalize across their large corpus of training data and perform zero-shot reasoning on new events, allowing them to detect possible portfolio crash events without requiring specific training data. However, detecting portfolio crashes is a complex problem that requires more than basic reasoning abilities. Investors need to dynamically process the impact of each new information found in the news articles, analyze the the relational network of impacts across news events and portfolio stocks, as well as understand the temporal context between impacts across time-steps, in order to obtain the overall aggregated effect on the target portfolio. In this work, we propose an algorithmic framework named Temporal Relational Reasoning (TRR). It seeks to emulate the spectrum of human cognitive capabilities used for complex problem-solving, which include brainstorming, memory, attention and reasoning. Through extensive experiments, we show that TRR is able to outperform state-of-the-art solutions on detecting stock portfolio crashes, and demonstrate how each of the proposed components help to contribute to its performance through an ablation study. Additionally, we further explore the possible applications of TRR by extending it to other related complex problems, such as the detection of possible global crisis events in Macroeconomics.

large language model, machine learning, portfolio, (18 more...)

arXiv.org Artificial Intelligence

2410.17266

Country:

Europe > Greece (0.14)
North America > Canada (0.14)
Asia > Singapore (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Real Estate (1.00)
Banking & Finance > Economy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback